{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "8GLCM_xrxJdc",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "Let's start by importing Pandas. Click on a cell to activate it. Press Shift + Enter to execute. Cells should be executed in order."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "U_5zMEGZL96I"
   },
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "7qQ1n2fkx234"
   },
   "source": [
    "## Load / Create Data Frame\n",
    "We can read CSV or excel files to import data. We can also create dataframes programatically like this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "yyFj31jcMmH6"
   },
   "outputs": [],
   "source": [
    "df = pd.DataFrame({'Artist':['Billie Holiday','Jimi Hendrix', 'Miles Davis', 'SIA'],\n",
    "              'Genre': ['Jazz', 'Rock', 'Jazz', 'Pop'],\n",
    "              'Listeners': [1300000, 2700000, 1500000, 2000000],\n",
    "              'Plays': [27000000, 70000000, 48000000, 74000000]})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "wXj4OuOrxwJm"
   },
   "source": [
    " The variable `df` now contains a dataframe. Executing this cell shows the contents of the dataframe"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 173
    },
    "colab_type": "code",
    "id": "WF4_LVYOO__m",
    "outputId": "a7294a6d-6217-4c77-ff41-3e5bcb16e8f8"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Artist</th>\n",
       "      <th>Genre</th>\n",
       "      <th>Listeners</th>\n",
       "      <th>Plays</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Billie Holiday</td>\n",
       "      <td>Jazz</td>\n",
       "      <td>1300000</td>\n",
       "      <td>27000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Jimi Hendrix</td>\n",
       "      <td>Rock</td>\n",
       "      <td>2700000</td>\n",
       "      <td>70000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Miles Davis</td>\n",
       "      <td>Jazz</td>\n",
       "      <td>1500000</td>\n",
       "      <td>48000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>SIA</td>\n",
       "      <td>Pop</td>\n",
       "      <td>2000000</td>\n",
       "      <td>74000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           Artist Genre  Listeners     Plays\n",
       "0  Billie Holiday  Jazz    1300000  27000000\n",
       "1    Jimi Hendrix  Rock    2700000  70000000\n",
       "2     Miles Davis  Jazz    1500000  48000000\n",
       "3             SIA   Pop    2000000  74000000"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "6LUcXFMKyF6z"
   },
   "source": [
    "## Selection\n",
    "We can select any column using its label:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 102
    },
    "colab_type": "code",
    "id": "sTNHtQz3Mu_K",
    "outputId": "de1bf82f-cd3d-4753-efc9-b86928756606",
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    Billie Holiday\n",
       "1      Jimi Hendrix\n",
       "2       Miles Davis\n",
       "3               SIA\n",
       "Name: Artist, dtype: object"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['Artist']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "UeH1ClC3yOJy"
   },
   "source": [
    "We can select one or multiple rows using their numbers (inclusive of both bounding row numbers):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 142
    },
    "colab_type": "code",
    "id": "41MpPCPwM0AR",
    "outputId": "7c36a085-5c6c-48b6-b602-9de9ca06be9e"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Artist</th>\n",
       "      <th>Genre</th>\n",
       "      <th>Listeners</th>\n",
       "      <th>Plays</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Jimi Hendrix</td>\n",
       "      <td>Rock</td>\n",
       "      <td>2700000</td>\n",
       "      <td>70000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Miles Davis</td>\n",
       "      <td>Jazz</td>\n",
       "      <td>1500000</td>\n",
       "      <td>48000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>SIA</td>\n",
       "      <td>Pop</td>\n",
       "      <td>2000000</td>\n",
       "      <td>74000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         Artist Genre  Listeners     Plays\n",
       "1  Jimi Hendrix  Rock    2700000  70000000\n",
       "2   Miles Davis  Jazz    1500000  48000000\n",
       "3           SIA   Pop    2000000  74000000"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[1:3]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 80
    },
    "colab_type": "code",
    "id": "fhOMUtTNPnc0",
    "outputId": "41b574a4-b467-42cb-b82e-1505d3f788fe"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Artist</th>\n",
       "      <th>Genre</th>\n",
       "      <th>Listeners</th>\n",
       "      <th>Plays</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Billie Holiday</td>\n",
       "      <td>Jazz</td>\n",
       "      <td>1300000</td>\n",
       "      <td>27000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           Artist Genre  Listeners     Plays\n",
       "0  Billie Holiday  Jazz    1300000  27000000"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Exercise: Select only row #0\n",
    "df.loc[0:0]\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "7eeFjuAUyTDj"
   },
   "source": [
    "We can select any slice of the table using a both column label and row numbers using loc:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "p7EPN96RPHQt"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Artist</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Jimi Hendrix</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Miles Davis</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>SIA</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         Artist\n",
       "1  Jimi Hendrix\n",
       "2   Miles Davis\n",
       "3           SIA"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[1:3,['Artist']]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "ov_6TtCvPubJ"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Artist</th>\n",
       "      <th>Plays</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Jimi Hendrix</td>\n",
       "      <td>70000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Miles Davis</td>\n",
       "      <td>48000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         Artist     Plays\n",
       "1  Jimi Hendrix  70000000\n",
       "2   Miles Davis  48000000"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Exercise: Select 'Artist' and 'Plays' for rows #1 and #2\n",
    "df.loc[1:2,['Artist','Plays']]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "GeuS62DoyXlJ"
   },
   "source": [
    "## Filtering\n",
    "Now it gets more interesting. We can easily filter rows using the values of a specific row. For example, here are our jazz musicians:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "ax8qDMHfPQmP"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Artist</th>\n",
       "      <th>Genre</th>\n",
       "      <th>Listeners</th>\n",
       "      <th>Plays</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Billie Holiday</td>\n",
       "      <td>Jazz</td>\n",
       "      <td>1300000</td>\n",
       "      <td>27000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Miles Davis</td>\n",
       "      <td>Jazz</td>\n",
       "      <td>1500000</td>\n",
       "      <td>48000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           Artist Genre  Listeners     Plays\n",
       "0  Billie Holiday  Jazz    1300000  27000000\n",
       "2     Miles Davis  Jazz    1500000  48000000"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[df['Genre'] == \"Jazz\" ]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "Surgh6XVRmlT"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Artist</th>\n",
       "      <th>Genre</th>\n",
       "      <th>Listeners</th>\n",
       "      <th>Plays</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Jimi Hendrix</td>\n",
       "      <td>Rock</td>\n",
       "      <td>2700000</td>\n",
       "      <td>70000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         Artist Genre  Listeners     Plays\n",
       "1  Jimi Hendrix  Rock    2700000  70000000"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Exercise: Select the row where the Genre is 'Rock'\n",
    "df[df['Genre']==\"Rock\"]\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "a33J488lyeFj"
   },
   "source": [
    "Here are the artists who have more than 1,800,000 listeners:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "H1Ld-RXQPTLL"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Artist</th>\n",
       "      <th>Genre</th>\n",
       "      <th>Listeners</th>\n",
       "      <th>Plays</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Jimi Hendrix</td>\n",
       "      <td>Rock</td>\n",
       "      <td>2700000</td>\n",
       "      <td>70000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>SIA</td>\n",
       "      <td>Pop</td>\n",
       "      <td>2000000</td>\n",
       "      <td>74000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         Artist Genre  Listeners     Plays\n",
       "1  Jimi Hendrix  Rock    2700000  70000000\n",
       "3           SIA   Pop    2000000  74000000"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[df['Listeners'] > 1800000 ]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "8XLaLBKORva4"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Artist</th>\n",
       "      <th>Genre</th>\n",
       "      <th>Listeners</th>\n",
       "      <th>Plays</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Billie Holiday</td>\n",
       "      <td>Jazz</td>\n",
       "      <td>1300000</td>\n",
       "      <td>27000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Miles Davis</td>\n",
       "      <td>Jazz</td>\n",
       "      <td>1500000</td>\n",
       "      <td>48000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           Artist Genre  Listeners     Plays\n",
       "0  Billie Holiday  Jazz    1300000  27000000\n",
       "2     Miles Davis  Jazz    1500000  48000000"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Exercise: Select the rows where 'Plays' is less than 50,000,000\n",
    "\n",
    "df[df['Plays'] < 50000000]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "MPeYsXStyuwA"
   },
   "source": [
    "## Grouping"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "qPG_jFSdPMF5"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Listeners</th>\n",
       "      <th>Plays</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Genre</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Jazz</th>\n",
       "      <td>2800000</td>\n",
       "      <td>75000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Pop</th>\n",
       "      <td>2000000</td>\n",
       "      <td>74000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Rock</th>\n",
       "      <td>2700000</td>\n",
       "      <td>70000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       Listeners     Plays\n",
       "Genre                     \n",
       "Jazz     2800000  75000000\n",
       "Pop      2000000  74000000\n",
       "Rock     2700000  70000000"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby('Genre').sum()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "kb5smli-PVE3"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Listeners</th>\n",
       "      <th>Plays</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Genre</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Jazz</th>\n",
       "      <td>1400000</td>\n",
       "      <td>37500000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Pop</th>\n",
       "      <td>2000000</td>\n",
       "      <td>74000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Rock</th>\n",
       "      <td>2700000</td>\n",
       "      <td>70000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       Listeners     Plays\n",
       "Genre                     \n",
       "Jazz     1400000  37500000\n",
       "Pop      2000000  74000000\n",
       "Rock     2700000  70000000"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Exercise: Group by Genre, use mean() as the aggregation function \n",
    "df.groupby('Genre').mean()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "g2IFgCF4Rg48"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Artist</th>\n",
       "      <th>Listeners</th>\n",
       "      <th>Plays</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Genre</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Jazz</th>\n",
       "      <td>Miles Davis</td>\n",
       "      <td>1500000</td>\n",
       "      <td>48000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Pop</th>\n",
       "      <td>SIA</td>\n",
       "      <td>2000000</td>\n",
       "      <td>74000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Rock</th>\n",
       "      <td>Jimi Hendrix</td>\n",
       "      <td>2700000</td>\n",
       "      <td>70000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Artist  Listeners     Plays\n",
       "Genre                                   \n",
       "Jazz    Miles Davis    1500000  48000000\n",
       "Pop             SIA    2000000  74000000\n",
       "Rock   Jimi Hendrix    2700000  70000000"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Exercise: Group by Genre, use max() as the aggregation function \n",
    "df.groupby('Genre').max()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 增加练习内容"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [],
   "source": [
    "s = pd.Series([1, 3, 5, np.nan, 6, 8])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    1.0\n",
       "1    3.0\n",
       "2    5.0\n",
       "3    NaN\n",
       "4    6.0\n",
       "5    8.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [],
   "source": [
    "dates = pd.date_range('20130101', periods=6)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',\n",
       "               '2013-01-05', '2013-01-06'],\n",
       "              dtype='datetime64[ns]', freq='D')"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dates"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## np.random.randn(6,4)是什么意思？"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2013-01-01</th>\n",
       "      <td>0.704434</td>\n",
       "      <td>-0.585204</td>\n",
       "      <td>0.434844</td>\n",
       "      <td>0.802965</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-02</th>\n",
       "      <td>2.125107</td>\n",
       "      <td>0.082478</td>\n",
       "      <td>-1.185925</td>\n",
       "      <td>1.427529</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-03</th>\n",
       "      <td>0.076599</td>\n",
       "      <td>-0.104691</td>\n",
       "      <td>0.191124</td>\n",
       "      <td>-0.339191</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-04</th>\n",
       "      <td>-0.056054</td>\n",
       "      <td>1.576477</td>\n",
       "      <td>-0.316477</td>\n",
       "      <td>-0.418990</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-05</th>\n",
       "      <td>1.197152</td>\n",
       "      <td>-0.161828</td>\n",
       "      <td>-0.786203</td>\n",
       "      <td>0.076950</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2013-01-06</th>\n",
       "      <td>-0.853178</td>\n",
       "      <td>0.939005</td>\n",
       "      <td>-2.181424</td>\n",
       "      <td>-1.867079</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                   A         B         C         D\n",
       "2013-01-01  0.704434 -0.585204  0.434844  0.802965\n",
       "2013-01-02  2.125107  0.082478 -1.185925  1.427529\n",
       "2013-01-03  0.076599 -0.104691  0.191124 -0.339191\n",
       "2013-01-04 -0.056054  1.576477 -0.316477 -0.418990\n",
       "2013-01-05  1.197152 -0.161828 -0.786203  0.076950\n",
       "2013-01-06 -0.853178  0.939005 -2.181424 -1.867079"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "colab": {
   "collapsed_sections": [],
   "name": "Pandas Intro.ipynb",
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.16"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
